452 research outputs found

    λ©€ν‹° νƒœμŠ€ν‚Ή ν™˜κ²½μ—μ„œ GPUλ₯Ό μ‚¬μš©ν•œ λ²”μš©μ  계산 μ‘μš©μ˜ 효율적인 μ‹œμŠ€ν…œ μžμ› ν™œμš©μ„ μœ„ν•œ GPU μ‹œμŠ€ν…œ μ΅œμ ν™”

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2020. 8. μ—Όν—Œμ˜.Recently, General Purpose GPU (GPGPU) applications are playing key roles in many different research fields, such as high-performance computing (HPC) and deep learning (DL). The common feature exists in these applications is that all of them require massive computation power, which follows the high parallelism characteristics of the graphics processing unit (GPU). However, because of the resource usage pattern of each GPGPU application varies, a single application cannot fully exploit the GPU systems resources to achieve the best performance of the GPU since the GPU system is designed to provide system-level fairness to all applications instead of optimizing for a specific type. GPU multitasking can address the issue by co-locating multiple kernels with diverse resource usage patterns to share the GPU resource in parallel. However, the current GPU mul- titasking scheme focuses just on co-launching the kernels rather than making them execute more efficiently. Besides, the current GPU multitasking scheme is not open-sourced, which makes it more difficult to be optimized, since the GPGPU applications and the GPU system are unaware of the feature of each other. In this dissertation, we claim that using the support from framework between the GPU system and the GPGPU applications without modifying the application can yield better performance. We design and implement the frame- work while addressing two issues in GPGPU applications. First, we introduce a GPU memory checkpointing approach between the host memory and the device memory to address the problem that GPU memory cannot be over-subscripted in a multitasking environment. Second, we present a fine-grained GPU kernel management scheme to avoid the GPU resource under-utilization problem in a i multitasking environment. We implement and evaluate our schemes on a real GPU system. The experimental results show that our proposed approaches can solve the problems related to GPGPU applications than the existing approaches while delivering better performance.졜근 λ²”μš© GPU (GPGPU) μ‘μš© ν”„λ‘œκ·Έλž¨μ€ κ³ μ„±λŠ₯ μ»΄ν“¨νŒ… (HPC) 및 λ”₯ λŸ¬λ‹ (DL)κ³Ό 같은 λ‹€μ–‘ν•œ 연ꡬ λΆ„μ•Όμ—μ„œ 핡심적인 역할을 μˆ˜ν–‰ν•˜κ³  μžˆλ‹€. μ΄λŸ¬ν•œ 응 용 λΆ„μ•Όμ˜ 곡톡적인 νŠΉμ„±μ€ κ±°λŒ€ν•œ 계산 μ„±λŠ₯이 ν•„μš”ν•œ 것이며 κ·Έλž˜ν”½ 처리 μž₯치 (GPU)의 높은 병렬 처리 νŠΉμ„±κ³Ό 맀우 μ ν•©ν•˜λ‹€. κ·ΈλŸ¬λ‚˜ GPU μ‹œμŠ€ν…œμ€ νŠΉμ • 유 ν˜•μ˜ μ‘μš© ν”„λ‘œκ·Έλž¨μ— μ΅œμ €ν™”ν•˜λŠ” λŒ€μ‹  λͺ¨λ“  μ‘μš© ν”„λ‘œκ·Έλž¨μ— μ‹œμŠ€ν…œ μˆ˜μ€€μ˜ 곡정 성을 μ œκ³΅ν•˜λ„λ‘ μ„€κ³„λ˜μ–΄ 있으며 각 GPGPU μ‘μš© ν”„λ‘œκ·Έλž¨μ˜ μžμ› μ‚¬μš© νŒ¨ν„΄μ΄ λ‹€μ–‘ν•˜κΈ° λ•Œλ¬Έμ— 단일 μ‘μš© ν”„λ‘œκ·Έλž¨μ΄ GPU μ‹œμŠ€ν…œμ˜ λ¦¬μ†ŒμŠ€λ₯Ό μ™„μ „νžˆ ν™œμš©ν•˜μ—¬ GPU의 졜고 μ„±λŠ₯을 달성 ν•  μˆ˜λŠ” μ—†λ‹€. λ”°λΌμ„œ GPU λ©€ν‹° νƒœμŠ€ν‚Ήμ€ λ‹€μ–‘ν•œ λ¦¬μ†ŒμŠ€ μ‚¬μš© νŒ¨ν„΄μ„ 가진 μ—¬λŸ¬ μ‘μš© ν”„λ‘œκ·Έ λž¨μ„ ν•¨κ»˜ λ°°μΉ˜ν•˜μ—¬ GPU λ¦¬μ†ŒμŠ€λ₯Ό κ³΅μœ ν•¨μœΌλ‘œμ¨ GPU μžμ› μ‚¬μš©λ₯  μ €ν•˜ 문제λ₯Ό ν•΄κ²°ν•  수 μžˆλ‹€. κ·ΈλŸ¬λ‚˜ κΈ°μ‘΄ GPU λ©€ν‹° νƒœμŠ€ν‚Ή κΈ°μˆ μ€ μžμ› μ‚¬μš©λ₯  κ΄€μ μ—μ„œ 응 용 ν”„λ‘œκ·Έλž¨μ˜ 효율적인 싀행보닀 κ³΅λ™μœΌλ‘œ μ‹€ν–‰ν•˜λŠ” 데 쀑점을 λ‘”λ‹€. λ˜ν•œ ν˜„μž¬ GPU λ©€ν‹° νƒœμŠ€ν‚Ή κΈ°μˆ μ€ μ˜€ν”ˆ μ†ŒμŠ€κ°€ μ•„λ‹ˆλ―€λ‘œ μ‘μš© ν”„λ‘œκ·Έλž¨κ³Ό GPU μ‹œμŠ€ν…œμ΄ μ„œλ‘œμ˜ κΈ°λŠ₯을 μΈμ‹ν•˜μ§€ λͺ»ν•˜κΈ° λ•Œλ¬Έμ— μ΅œμ ν™”ν•˜κΈ°κ°€ 더 μ–΄λ €μšΈ μˆ˜λ„ μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ‘μš© ν”„λ‘œκ·Έλž¨μ„ μˆ˜μ • 없이 GPU μ‹œμŠ€ν…œκ³Ό GPGPU μ‘μš© 사 이의 ν”„λ ˆμž„μ›Œν¬λ₯Ό 톡해 μ‚¬μš©ν•˜λ©΄ 보닀 높은 μ‘μš©μ„±λŠ₯κ³Ό μžμ› μ‚¬μš©μ„ 보일 수 μžˆμŒμ„ 증λͺ…ν•˜κ³ μž ν•œλ‹€. 그러기 μœ„ν•΄ GPU νƒœμŠ€ν¬ 관리 ν”„λ ˆμž„μ›Œν¬λ₯Ό κ°œλ°œν•˜μ—¬ GPU λ©€ν‹° νƒœμŠ€ν‚Ή ν™˜κ²½μ—μ„œ λ°œμƒν•˜λŠ” 두 가지 문제λ₯Ό ν•΄κ²°ν•˜μ˜€λ‹€. 첫째, λ©€ν‹° νƒœ μŠ€ν‚Ή ν™˜κ²½μ—μ„œ GPU λ©”λͺ¨λ¦¬ 초과 ν• λ‹Ήν•  수 μ—†λŠ” 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ 호슀트 λ©”λͺ¨λ¦¬μ™€ λ””λ°”μ΄μŠ€ λ©”λͺ¨λ¦¬μ— 체크포인트 방식을 λ„μž…ν•˜μ˜€λ‹€. λ‘˜μ§Έ, λ©€ν‹° νƒœμŠ€ν‚Ή ν™˜ κ²½μ—μ„œ GPU μžμ› μ‚¬μš©μœ¨ μ €ν•˜ 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ λ”μš± μ„ΈλΆ„ν™” 된 GPU 컀널 관리 μ‹œμŠ€ν…œμ„ μ œμ‹œν•˜μ˜€λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ œμ•ˆν•œ λ°©λ²•λ“€μ˜ 효과λ₯Ό 증λͺ…ν•˜κΈ° μœ„ν•΄ μ‹€μ œ GPU μ‹œμŠ€ν…œμ— 92 κ΅¬ν˜„ν•˜κ³  κ·Έ μ„±λŠ₯을 ν‰κ°€ν•˜μ˜€λ‹€. μ œμ•ˆν•œ 접근방식이 κΈ°μ‘΄ μ ‘κ·Ό 방식보닀 GPGPU μ‘μš© ν”„λ‘œκ·Έλž¨κ³Ό κ΄€λ ¨λœ 문제λ₯Ό ν•΄κ²°ν•  수 있으며 더 높은 μ„±λŠ₯을 μ œκ³΅ν•  수 μžˆμŒμ„ 확인할 수 μžˆμ—ˆλ‹€.Chapter 1 Introduction 1 1.1 Motivation 2 1.2 Contribution . 7 1.3 Outline 8 Chapter 2 Background 10 2.1 GraphicsProcessingUnit(GPU) and CUDA 10 2.2 CheckpointandRestart . 11 2.3 ResourceSharingModel. 11 2.4 CUDAContext 12 2.5 GPUThreadBlockScheduling . 13 2.6 Multi-ProcessServicewithHyper-Q 13 Chapter 3 Checkpoint based solution for GPU memory over- subscription problem 16 3.1 Motivation 16 3.2 RelatedWork. 18 3.3 DesignandImplementation . 20 3.3.1 System Design 21 3.3.2 CUDAAPIwrappingmodule 22 3.3.3 Scheduler . 28 3.4 Evaluation. 31 3.4.1 Evaluationsetup . 31 3.4.2 OverheadofFlexGPU 32 3.4.3 Performance with GPU Benchmark Suits 34 3.4.4 Performance with Real-world Workloads 36 3.4.5 Performance of workloads composed of multiple applications 39 3.5 Summary 42 Chapter 4 A Workload-aware Fine-grained Resource Manage- ment Framework for GPGPUs 43 4.1 Motivation 43 4.2 RelatedWork. 45 4.2.1 GPUresourcesharing 45 4.2.2 GPUscheduling . 46 4.3 DesignandImplementation . 47 4.3.1 SystemArchitecture . 47 4.3.2 CUDAAPIWrappingModule . 49 4.3.3 smCompactorRuntime . 50 4.3.4 ImplementationDetails . 57 4.4 Analysis on the relation between performance and workload usage pattern 60 4.4.1 WorkloadDefinition . 60 4.4.2 Analysisonperformancesaturation 60 4.4.3 Predict the necessary SMs and thread blocks for best performance . 64 4.5 Evaluation. 69 4.5.1 EvaluationMethodology. 70 4.5.2 OverheadofsmCompactor . 71 4.5.3 Performance with Different Thread Block Counts on Dif- ferentNumberofSMs 72 4.5.4 Performance with Concurrent Kernel and Resource Sharing 74 4.6 Summary . 79 Chapter 5 Conclusion. 81 μš”μ•½. 92Docto

    A posteriori error bounds for the block-Lanczos method for matrix function approximation

    Full text link
    We extend the error bounds from [SIMAX, Vol. 43, Iss. 2, pp. 787-811 (2022)] for the Lanczos method for matrix function approximation to the block algorithm. Numerical experiments suggest that our bounds are fairly robust to changing block size and have the potential for use as a practical stopping criteria. Further experiments work towards a better understanding of how certain hyperparameters should be chosen in order to maximize the quality of the error bounds, even in the previously studied block-size one case

    Evaluation of diffuse mismatch model for phonon scattering at disordered interfaces

    Full text link
    Diffuse phonon scattering strongly affects the phonon transport through a disordered interface. The often-used diffuse mismatch model assumes that phonons lose memory of their origin after being scattered by the interface. Using mode-resolved atomic Green's function simulation, we demonstrate that diffuse phonon scattering by a single disordered interface cannot make a phonon lose its memory and thus the applicability of diffusive mismatch model is limited. An analytical expression for diffuse scattering probability based on the continuum approximation is also derived and shown to work reasonably well at low frequencies.Comment: 13 pages, 7 figure

    Ab initio study of electron mean free paths and thermoelectric properties of lead telluride

    Get PDF
    Last few years have witnessed significant enhancement of thermoelectric figure of merit of lead telluride (PbTe) via nanostructuring. Despite the experimental progress, current understanding of the electron transport in PbTe is based on either band structure calculation using first principles with constant relaxation time approximation or empirical models, both relying on adjustable parameters obtained by fitting experimental data. Here, we report parameter-free first-principles calculation of electron and phonon transport properties of PbTe, including mode-by-mode electron-phonon scattering analysis, leading to detailed information on electron mean free paths and the contributions of electrons and phonons with different mean free paths to thermoelectric transport properties in PbTe. Such information will help to rationalize the use and optimization of nanostructures to achieve high thermoelectric figure of merit

    Dirac-Electrons-Mediated Magnetic Proximity Effect in Topological Insulator / Magnetic Insulator Heterostructures

    Full text link
    The possible realization of dissipationless chiral edge current in a topological insulator / magnetic insulator heterostructure is based on the condition that the magnetic proximity exchange coupling at the interface is dominated by the Dirac surface states of the topological insulator. Here we report a polarized neutron reflectometry observation of Dirac electrons mediated magnetic proximity effect in a bulk-insulating topological insulator (Bi0.2_{0.2}Sb0.8_{0.8})2_{2}Te3_{3} / magnetic insulator EuS heterostructure. We are able to maximize the proximity induced magnetism by applying an electrical back gate to tune the Fermi level of topological insulator to be close to the charge neutral point. A phenomenological model based on diamagnetic screening is developed to explain the suppressed proximity induced magnetism at high carrier density. Our work paves the way to utilize the magnetic proximity effect at the topological insulator/magnetic insulator hetero-interface for low-power spintronic applications.Comment: 5 pages main text with 4 figures; 2 pages supplemental materials; suggestions and discussions are welcome

    Exposure time relevance of response to nitrite exposure: Insight from transcriptional responses of immune and antioxidant defense in the crayfish, Procambarus clarkii

    Get PDF
    Abstract(#br)To understand the toxic effects of nitrite exposure on crayfish, expression of genes involved in the immune system, the antioxidant defense, and the heat shock protein 70 (HSP70) was measured after 12, 24, and 48 h of different nitrite concentrations exposure in the hepatopancreas and hemocytes of Procambarus clarkii . Nitrite exposure up-regulated mRNA levels of cytoplasmic Mn superoxide dismutase (cMn-SOD), catalase (CAT), glutathione peroxidase (GPx), and glutathione-S-transferase (GST), after 24 h nitrite exposure. At 48 h, nitrite exposure decreased the mRNA levels of mitochondrial MnSOD (mMn-SOD), CAT, and GPx. High concentrations of nitrite at 48 h of exposure decreased expression of Ξ²-1,3-glucan-bingding protein in the hepatopancreas, and lysozyme expression in hemocytes. Nitrite exposure caused little effect on the heat shock protein 70 (HSP70) in hemocytes. Through overall clustering analysis, we found that 24 h of nitrite exposure caused stronger transcriptional responses. Our study indicated that the response of P. clarkii to acute nitrite exposure was exposure time-dependent. These results will help to understand the dynamic response pattern of crustaceans to nitrite pollution, and improve our understanding of the toxicological mechanisms of nitrite in crustaceans

    rad21 Is Involved in Corneal Stroma Development by Regulating Neural Crest Migration

    Get PDF
    Previously, we identified RAD21(R450C) from a peripheral sclerocornea pedigree. Injection of this rad21 variant mRNA into Xenopus laevis embryos disrupted the organization of corneal stroma fibrils. To understand the mechanisms of RAD21-mediated corneal stroma defects, gene expression and chromosome conformation analysis were performed using cells from family members affected by peripheral sclerocornea. Both gene expression and chromosome conformation of cell adhesion genes were affected in cells carrying the heterozygous rad21 variant. Since cell migration is essential in early embryonic development and sclerocornea is a congenital disease, we studied neural crest migration during cornea development in X. laevis embryos. In X. laevis embryos injected with rad21 mutant mRNA, neural crest migration was disrupted, and the number of neural crest-derived periocular mesenchymes decreased significantly in the corneal stroma region. Our data indicate that the RAD21(R450C) variant contributes to peripheral sclerocornea by modifying chromosome conformation and gene expression, therefore disturbing neural crest cell migration, which suggests RAD21 plays a key role in corneal stroma development
    • …
    corecore